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Abstract. A perturbation technique can be used to simplify and sharpen A. C. Yao's 
theorems about the behavior of shellsort with increments (h, g, 1). In particular, when 
h = 0(n 7 / 15 ) and g = 0(/i 1//5 ), the average running time is 0(n 23//15 ). The proof involves 
interesting properties of the inversions in random permutations that have been /i-sorted 
and o-sorted. 

o\ ■ 

ON 

Shellsort, also known as the "diminishing increment sort" [7, Algorithm 5. 2. ID], puts the elements 
of an array (Xo, . . . , A n _i) into order by successively performing a straight insertion sort on larger 

■ and larger subarrays of equally spaced elements. The algorithm consists of t passes defined by 

■ increments (ht—i, ■ ■ ■ , hi, ho), where ho = 1; the jth pass makes A& < X\ whenever I — k = ht-j- 

A. C. Yao [11] has analyzed the average behavior of shellsort in the general three-pass case 
when the increments are (h,g,l). The most interesting part of his analysis dealt with the third 
pass, where the running time is 0{n) plus a term proportional to the average number of inversions 
that remain after a random permutation has been /i-sorted and (7-sorted. Yao proved that if g 
, and h are relatively prime, the average number of inversions remaining is 

^{h,g)n + 0{n 2 ^), (0.1) 

in 

where the constant implied by O depends on g and h. He gave a complicated triple sum for i^{h, g), 

00 ' which is too difficult to explain here; we will show that 

O ' 
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Moreover, we will prove that the average number of inversions after such /i-sorting and ^-sorting is 

^(h,g)n + 0(g 3 h 2 ), (0.3) 

where the constant implied by O is independent of g, h, and n. 

The main technique used in proving (0.3) is to consider a stochastic algorithm A whose output 
has the same distribution as the inversions of the third pass of shellsort. Then by slightly perturbing 
the probabilities that define .A, we will obtain an algorithm A* whose output has the expected value 
tjj(h,g)n exactly. Finally we will prove that the perturbations cause the expected value to change 
by at most 0(g 3 h 2 ). 

Section 1 introduces basic techniques for inversion counting, and section 2 adapts those tech- 
niques to a random input model. Section 3 proves that the crucial random variables needed for 
inversion counting are nearly uniform; then section 4 shows that the leading term ip(h,g)n in (0.3) 
would be exact if those variables were perfectly uniform. Section 5 explains how to perturb them 
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so that they are indeed uniform, and section 6 shows how this perturbation yields the error term 
0(g 3 h 2 ) of (0.3). 

The asymptotic value of ip(h,g) is shown to be (irh/128) 1 / 2 g in section 7. The cost of the 
third pass in (ch, eg, l)-shellsort for c > 1 is analyzed in section 8. This makes it possible to bound 
the total running time for all three passes, as shown in section 9, leading to an 0(n 23 / 15 ) average 
running time when h and g are suitably chosen. 

The bound 0(g 3 h 2 ) in (0.3) may not be best possible. Section 10 discusses a conjectured 
improvement, consistent with computational experiments, which would reduce the average cost to 
0{n 3 / 2 ) if it could be proved. 

The tantalizing prospect of extending the techniques of this paper to more than three incre- 
ments is explored briefly in section 11. 

1. Counting inversions. We shall assume throughout this paper that g and h are relatively 
prime. To fix the ideas, suppose h = 5, g = 3, n = 20, and suppose we are sorting the 2-digit 
numbers 

(X , X 1 ,..., X n _i) = (03, 14, 15, 92, 65, 35, 89, 79, 32, 38, 46, 26, 43, 37, 31, 78, 50, 28, 84, 19) . 
(Cf. [6, Eq. 3.3-(l)].) The first pass of shellsort, /i-sorting, replaces this array by 

(X' 0l X[,..., = (03, 14, 15, 32, 19, 35, 26, 28, 37, 31, 46, 50, 43, 84, 38, 78, 89, 79, 92, 65) . 

The second pass, ^-sorting, replaces it by 

{X%, X'{, X'^) = (03, 14, 15, 26, 19, 35, 31, 28, 37, 32, 46, 38, 43, 65, 50, 78, 84, 79, 92, 89) . 

Our task is to study the inversions of this list, namely the pairs k, I for which k < I and X'l > X'/. 

The result of ^-sorting is the creation of g ordered lists X" < X'- +g < X'- +2g < • • • for 
< J ' < g, each of which contains no inversions within itself. So the inversions remaining are 
inversions between different sublists. For example, the 20 numbers sorted above lead to 

list = (03, 26, 31, 32, 43, 78, 92) , 
list 1 = (14, 19, 28, 46, 65, 84, 89) , 
list 2 = (15,35,37,38,50,79) ; 

the inversions between list and list 1 are the inversions of 

(03, 14, 26, 19, 31, 28, 32, 46, 43, 65, 78, 84, 92, 89) . 

It is well known [7, §5.21] that two interleaved ordered lists of length m have YlT^ \ r ~ Sr \ 
inversions, where s r of the elements of the second list are less than the (r + l)st element of the first 
list; for example, (03, 14, 26, . . . , 89) has 

|0 - 0| + |1 - 2| + |2 - 3| + |3 - 3| + |4 - 3| + |5 - 5| + |6 - 7| = 4 
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inversions. If r > s r , the (r + l)st element of the first list is inverted by r — s r elements of the second; 
otherwise it inverts s r — r of those elements. (We assume that the list elements are distinct.) The 
same formula holds for interleaved ordered lists of lengths m and m — 1, because we can imagine 
an infinite element at the end of the second list. 

Let Yy be the number of elements Xy such that k' = k (mod h) and Xy < X[. The n numbers 
Yu for < I < n clearly characterize the permutation performed by /i-sorting; and it is not hard 
to see that the full set of hn numbers Yy for < k < h and < / < n is enough to determine the 
relative order of all the X's. 

There is a convenient way to enumerate the inversions that remain after g-sorting, using the 
numbers Yu- Indeed, let 

Jy = (k mod h + hYki) mod g . (1-1) 

Then Xi will appear in list j = Ju after g-sorting. Let Sji be the number of elements Xy such 
that Xy < Xi and Xy is in list j. The inversions between lists j and j' depend on the difference 
\Sji — Sjq\ when X\ goes into list j. 

Given any values of j and j' with < j < j' < g, let j s = (j + hs) mod g, and let d be 
minimum with jd = j' . Thus, d is the distance from j to j' if we count by steps of h modulo g. Let 

H = {j 1 ,j 2 ,...,j d } (1.2) 

be the h numbers between j and j' in this counting process, and let Qi be the number of indices k 
such that < k < h and Ju G H. Then we can prove the following basic fact: 

Lemma 1. Using the notation above, we have 

Sji - S ri = Qi- [hd/g\ (1.3) 

for all j, j' , and I with < j < j' < g and < I < n. 

Proof. Since the X's are distinct, there is a permutation (lo,h, . . . , l n -i) of {0, 1, . . . , n — 1} such 
that Xi < Xi t < • • • < Xi n _ 1 . We will prove (1.3) for I = l t by induction on t. 

Suppose first that I = l , so that X\ is the smallest element being sorted. Then Yy = for 
all k; hence Jy = k mod g for < k < h. Also Sji = Sj>i = 0. Therefore (1.3) is equivalent in this 
case to the assertion that precisely [hd/g\ elements of the multiset 

{0 mod g, 1 mod g, . . . , [h — 1) mod g] 

belong to H. 

A clever proof of that assertion surely exists, but what is it? We can at any rate use brute force 
by assuming first that j = 0. Then the number of solutions to x = hd (mod g) and < x < h is 
the number of integers in the interval [—hd/g . . —h(d — l)/g), namely \—h(d — l)/<?] — l~ — hd/g] = 
[hd/g\ — [h(d — l)/g\- Therefore the assertion for j = follows by induction on d. And once we've 
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proved it for some pair j < f, we can prove it for j + 1 < f + 1, assuming that / + 1 < 5: The 
value of d stays the same, and the values of ji, J2, • • • ,jd increase by 1 (mod g). So we lose one 
solution if j s = h — 1 (mod 5) for some s with 1 < s < d; we gain one solution if j s = — 1 (mod g) 
for some s. Since j s = h — 1 <^=^ = —1, the net change is zero unless ji = h — 1 (but then 
j = g — 1) or j d = —1 (but then j' = g — 1). This completes the proof by brute force when / = l . 

Suppose (1.3) holds for / = l t ; we want to show that it also holds when / is replaced by /' = h+i- 
The numbers Y kl and Y kl > are identical for all but one value of k, since 

Y kV =Y kl + [l = k (mod h)} . 

Thus, the values of J k i and Jw are the same except that J k i increases by h (mod g) when k = I 
(mod h). It follows that 

Qi> = Qi + [Ju = j] - [Ju = f] ■ 

This completes the proof by induction on t, since Sji> = Sji + [Ju = j] for all j. □ 
Corollary. Using the notations above, the total number of inversions between lists j and j' is 

n-l 

Y,\Qi-Vhd/g\\[Ju=j]. (1.4) 

z=o 

Proof. This is \Sji — Sj>i\ = \r — s r \ summed over all r such that Xi is the (r + l)st element of 
list j. □ 

In the example of n = 20 two-digit numbers given earlier, with h = 5, g = 3, j = 0, and j' = 1, 
we have d = 2, H = {2, 1}, 

1=0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 

Xi = 03 14 15 92 65 35 89 79 32 38 46 26 43 37 31 78 50 28 84 19 
Y01 = 01143144122122133141 
Yu = 00143233222122232231 
Y 2 i = 00043243223122233141 
Y 3i = 00032132022021022020 
Y 4l = 00043244223132143140 
Jo; = 02 2 2 022 2 2 112 1 1 200 222 
J u =11001211222022212210 
J 2 / = 22212012002100022111 
J 3 , = 1 20 101 1 120 1 1 010 
J41 = 1 1 1 01200221 1 200 1 001 
= 34324434345244234343 

and the underlined values Ju are for I = 0, 3, 8, 11, 12, 14, 15 (accounting for the seven elements 
in list 0). The inversions between lists and 1 are therefore 

|3 - 3| + |2 - 3| + |3 - 3| + |2 - 3| + |4 - 3| + |2 - 3| + |3 - 3| = 4 

according to (1.4). 
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2. Random structures. We obtain a random run of shellsort if we assume that the input array 
(Xo,X\, . . . ,X n -i) is a random point in the n-dimensional unit cube. For each integer I in the 
range < I < n and for each "time" t in the range < t < 1, we will consider the contribution 
made by Xi to the total number of inversions if Xi = t. 

Thus, instead of the quantities Yu and Jki defined in the previous section, we define 

Yki(t)= £ [X k ,<t], (2.1) 

k'=k ( mod h) 
0<k'<n 

Jki(t) = (k mod h + hY k i(t)) mod g. (2.2) 

These equations are almost, but not quite, independent of I, because we assume that X[ =t while 
all other X's are uniformly and independently random. 

For each pair of indices j and f with < j < f < g, we define H as in (1.2), and we let 

h-l 

Qi (t) = E [ Jkl (*) ^ H \[ k ^ 1 mod h l ■ ( 2 - 3 ) 

fc=0 

This definition is slightly different from our original definition of Q\ , because we have excluded the 
term for k = I mod h. However, formula (1.4) remains valid because j £ H; when Ju = j, the 
excluded term is therefore zero. 

Notice that, for fixed I, the random variables Yki(t) for < k < h are independent. Therefore 
the random variables Jki(t) are independent; and Qi(t) is independent of Ju(t). The average 
contribution of Xi to the inversions between lists j and j' when X; = t is therefore 

W jfl (t) = Pr [Ju (t) = j] E | (t) - | (2.4) 

by (1.4), where probabilities and expectations are computed with respect to (X , . . . , 
Xi + i, . . . ,X n _i). The average total contribution of X; is obtained by integrating over all values 
oft: 

Lemma 2. Let 

W m = I W m {t)dt. (2.5) 
Jo 

Then the average grand total number of inversions in the third pass of shellsort is 

E W n'i- □ ( 2 - 6 ) 

0<j<j'<g 

0<l<n 

Our goal is to find the asymptotic value of this sum, by proving that it agrees with the estimate 
(0.3) stated in the introduction. 

3. Near uniformity. The complicated formulas of the previous section become vastly simpler 
when we notice that each random variable Jki(t) is almost uniformly distributed: The probability 
that Jki(t) = j is very close to 1/g, for each j, as long as t is not too close to or 1. To prove this 
statement, it suffices to show that Yfcj(t) mod g is approximately uniform, because h is relatively 
prime to g. Notice that Yki{t) has a binomial distribution, because it is the sum of approximately 
n/h independent random 0-1 variables that take the value 1 with probability t. 



5 



Lemma 3. If Y has the binomial distribution with parameters (m,t), then 



Pr \Y mod g = j] 



1 



< - 4>gm(t) 



for < j < g, where 



J gm 



m/g 2 



k=l 



Proof. Let yj = Pr \Y mod g = j] , and consider the discrete Fourier transform 



LO 



kY 



where u = e 27Tl ^ g . We have 



Vk 



1=0 



J2 ( ? V C 1 ~ t) m ~ l uj kl = (u k t + 1 - t) m , 



and 



\u k t + 1 - t\ 2 = t 2 + (1 - tf + t(l - t)(cu k + cu- k ) 
= 1 - 2t(l - *)(1 - cos2tt/c/c/) 
= 1 - At(l - t) sin 2 vr/c/fir. 
If < x < tt/2 we have sinx > 2x/7r; hence, if < k < 

\ u " t + 1 _ t |2 < ! _ 16t(1 _ t)k 2 /g 2 < e -16 t (l-t)W 

And if |g < A; < g we have |?/fc| = |y 9 -fc|- Therefore 

s-i 9/2 

^| 2 / fc |<2^e- 8 *( 1 -*) fe2 ^ 2 <^ m (t). 



fc=i 

The desired result follows since 
and thus 

Corollary. We have 
for < k < h, where 



k=i 



1 3-1 



y k=0 



1 




1 


v '-- s 
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^ 9-1 ^ 9-1 



fc=i 



Pr [Jki{t)=j]-- g 



fc=i 



1 



/ 2 Er=i e" 4 ^ 1 -*)^"^^ , if n > 4/i; 



5, 



if n < Ah. 



(3.1) 
(3.2) 



(3.3) 



(3.4) 



(3.5) 



(3.6) 
(3.7) 



Proof. Each variable Yfcz(i) in (2.1) for < k < h has the binomial distribution with parameters 
(m, t), where if n > Ah 



ft ft 

m = \{n - k)/h] - [k = I mod h] > - - 2 > 



h ~ 2h 

Now Jki(t) = j if and only if Yki(t) has a certain value mod g. The case n < Ah is trivial. □ 
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4. Uniformity. Let's assume now that, for given I and t, the random variables Jkiif) have a 
perfectly uniform distribution. Since the variables Jki (t) are independent for < k < h, this means 
that 

P*[Joi(t)=jo,Jii(t)=ji,...,J( h -i)i(t) =j h -i] = (4.1) 

for all /i-tuples (j ,ji,. . . , jh-i)- 

In such a case the random variable Qi(t) defined in (2.3) is the sum of h — 1 independent 
indicator variables, each equal to 1 with probability d/g because H has d elements. Hence Qi(t) 
has the binomial distribution with parameters (h — 1, d/g), and it is equal to r with probability 



h-1 

r 



9 J 



h-l-r 



(4.2) 



Let W*j,[(t) be the value of Wjyj(t) under the assumption of uniformity (see (2.4)). Thus 
W*j ri (t) is independent of t, and we let W*-, t = Wjj n (t) in accordance with (2.5). Then 



h-1 

q ^ 

y r=0 



h-1 

r 



h-l-r 



hd 

9 



(4.3) 



For given values of d and j, the index j' = (j + hd) mod g is at distance d from j. Suppose that 
a(d) of these pairs (j,f) have j < j' . Then g — a(d) of them have j > j' , and a(g — d) = g — a(d) 
since j is at distance g — d from f. The sum of (4.3) over all j < j' is therefore independent of a(d): 



0<j<j'<g d=l y r=0 V 



/i-l-r 



3-1 , h-1 

^ a(g - d) 



d=i 



g-l h-1 



r=0 



h-1 

r 



h-1 
h-l-r 



g-d 



h-l-r 



hd 
L 9 

g-d 



h—l — r — 
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h-l-r 



Kg - d) 



hd 
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(We have used the fact that [h(g — d)/g\ = h — 1 — \ hd/g\ when hd/g is not an integer.) But this 
is just the quantity ip(h,g) in (0.2), for each value of I. We have proved 

Lemma 4. If we assume that the variables Jki(t) have exactly the uniform distribution, the quan- 
tity (2.6) is exactly if) (h, g)n. □ 



5. Perturbation. To complete the proof of (0.3), we use a general technique applicable to 
the analysis of many algorithms: If a given complicated algorithm A almost always has the same 
performance characteristics as a simpler algorithm A*, then the expected performance of A is the 
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same as the performance of A* plus an error term based on the cases where A and A* differ. (See, 
for example, the analysis in [8], where this "principle of negligible perturbation" is applied to a 
nontrivial branching process.) 

In the present situation we retain the (n — l)-dimensional probability space (Xo, . . . ,Xi-i,t, 
. . . ,X n _i) on which the random variables Jki(t) were defined in (2.2), and we define a new 
set of random variables on the same space, where J^ t (t) has exactly a uniform distribution 

on {0, 1, ... ,g — 1}. This can be done in such a way that Jki{t) = Jli{t) with high probability. 

More precisely, when / and t are given, J^i (t) depends only on the variables with k' = k 
(mod h) and k' / I. The unit cube on those variables is partitioned into g parts P , Pi, . . . , P g -i 
such that Jki(t) = j when the variables lie in Pj; the volume of Pj is Pr [Jki(t) = j]. We will divide 
each Pj into g sets Pj , Pj l5 . . . , P'^ g _ 1 y and define J£i(t) = i on P^. This subdivision, performed 
separately for each k, will yield independent random variables </o;(i), Ju(t), J^ h _ 1 ^(t). We 
will show that the subdivision can be done in such a way that 

Pr[JZ l (t)=j] = V9, (5-1) 
Pv[J* kl (t)^J kl (t)]<^(t), (5.2) 

for < j < g and < k < h. Thus, we will have perturbed the values of Jki(t) with low probability 
when (p(t) is small. 

The following construction does what we need, and more: 

Lemma 5. Let pi,...,p m and p*, . . . ,p* m be nonnegative real numbers with Pi + ■ ■ ■ + p m = 
Pi + " " " + Pm = 1- Then there are nonnegative reals p'^ for 1 < i,j < m such that 

m 

Pi = J2Pij, (5-3) 

m 

i=i 

and | 

Proof. This is a special case of "maximal coupling" in probability theory [5; 9, §111.14]; it can be 
proved as follows. 

Let p'^ = mm(pj,p*), and observe that 

J2 p jj = m[n (pj . p*j ) = \ & + p j - \pj - p*j i ) = 1 ~ 2 Y \ pj ~ p *>\~ ( 5-6 ) 

3 3 j " j 

The existence of nonnegative p'^, i / j, such that (5.3) and (5.4) hold follows from the max ffow-min 
cut theorem [4]: Consider a network with a source s, a sink t, and 2m nodes vi, . . . , v m , v\, . . . , v^; 
the edges are svj with capacity pj —p'jj , v*t with capacity p* —p'jj , and ViV* with infinite capacity. □ 
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6. The effect of perturbation. When independent random variables J^(t) have been defined 
satisfying (5.1) and (5.2), we can use them to define QJ(t) as in (2.3) and W* / ; (r) as in (2.4). This 
value Wfjuty) has already been evaluated in (4.3); we want now to use the idea of perturbation to 
see how much Wjjn(t) can differ from W^yj(t). 

Since Q,(t) = 0{h) and 

h-l 

\Qi{t) - QUt)\ < E [ J «W * W)\ . (6-1) 

k=0 

we have 

I Wjj'iijt) -W£,,(t)| = | (Pr [ J« (*) = J] - Pr [J£ (t) = j] ) E | Q t (t) - [hd/g\ | 

+ Pr [J* (t ) = j\ (E | Q l (t ) - LM/ 5 j | - E | Q* z (t) - L^/sJ | ) 



i 1 

<-^)0(/ l ) + -^Pr[4(^J* i (*)] 



5 a k=o 

= o(j^J<P(t). (6.2) 

(We assume that J^(t) = J* fc mod h ^(t) when A; > h.) 

To complete our estimate, we need to integrate this difference over all t. 

Lemma 6. J* 0(t) (it = 0(g 2 h/n). 

Proof. The case n < Ah is trivial. Otherwise we have 



,1 ,1/2 

/ (f>(t) dt = 2 <f>(t) dt 
Jo Jo 

L £ 

<4 y E e ~ 2 ' fcVs2/lcft 



1/2 co 

<4/ Ve-^'di 



, Q 2 h 7T 2 <7 2 /i 

= 4 E = ^ — - □ 

^— ' 2k z n 3 n 
fc=i 

Theorem 1. The average number of inversions remaining after h-sorting and then g-sorting a 
random permutation of n elements, when h is relatively prime to g, is tp(h,g)n + 0(g 3 h 2 ), where 
ip(h,g) is given by (0.2). 

Proof. By (6.2) and Lemmas 2, 4, and 6, the average is if)(h,g)n plus 

E l\w jri {t)-W* ri {t))dt = 0{g 2 n)0{h/g) [\(t)dt 
■ . ^ ., ^ Jo Jo 



0<j<j <g 

0<l<n 



Notice that the proof of this theorem implicitly uses Lemma 5 for each choice of I and t, 
without requiring any sort of continuity between the values of J^it) as t varies. We could have 
defined J^if) in a continuous fashion; indeed, the random variables [Xf. < t] partition the (n — 1)- 
cube into 2 n_1 subrectangles in each of which Jki(t) has a constant value, so we could define J£i(t) 
over (n — l)-dimensional rectangular prisms with smooth transitions as a function of t. But such 
complicated refinements are not necessary for the validity of the perturbation argument. 



7. Asymptotics. Our next goal is to estimate ip(h,g) when h and g are large. Notice that 



Y>(M) = ^E E 



hd 



2 d -, 



Z(h-l,d/g)- 



L 9 



(7.1) 



where Z(m,p) has the binomial distribution with parameters m and p. The mean of Z(h — 1, d/g) 
is (h — l)d/g = [hd/g\ +0(1), and the variance is (h — \)d(g — d) / g 2 . If we replace Z by a normally 
distributed random variable with this same mean and variance, the expected value of \Z — \hd/g\ \ 
is approximately (27r) -1 / 2 \t\e~ t2 / 2 dt = 2/v / 27r times the standard deviation, so (7.1) will be 
approximately 

^E^)- ( 7 - 2 ) 

d=l 

The detailed calculations in the remainder of this section justify this approximation and provide a 
rigorous error bound. 

Lemma 7. If Z has the binomial distribution with parameters (m,p), and [mp\ < a < \mp], 
then 

E|Z-„| = yME2™ +0 (_J_Y (7 . 3) 
V vr Ky/mp^-p)/ 

Proof. Consider first the case a = mp. By a formula of De Moivre [1, page 101] and Poincare [10, 
pages 56-60], see Diaconis and Zabell [2], 

E\Z-mp\ =2\mp}(^2 p ^P ]mpA ( l -p) m+1 ~ ]mp] - ( 7A ) 

In order to prove (7.3) in this case we may assume thatp < 1/2, since \Z — mp\ = \m — Z — m(\—p)\. 
Moreover, we may assume that mp > 1 since (7.3) otherwise is trivial. Then, a routine application 
of Stirling's approximation shows that 



E|z _ mp| . v Ml^ exp ( 0( i_)). ( , 5) 

Next observe that if [mp\ < a < \mp] , we have 

E\Z-a\ =E\Z - mp\ + (mp- a) (l-2Pr[Z <mp\) . (7.6) 

Since Pr [Z < mp] = \ + 0((mp(l — p)) -1 / 2 ), for example by the Berry-Esseen estimate of the 
error in the central limit theorem [3, §XVI.5], the result follows. f~J 
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Corollary. The asymptotic value of ip(h,g) is 



m 9) = {^9 + O(g-^h^) + 0{<,h-W) . (7.7) 

Proof. Since [hd/g\ < [(h + l)d/g\ < [{hd + g-l)/g\ = \hd/g], Lemma 7 yields 

1 3-1 

And Euler's summation formula with f(x) = y 7 (x/g)(l — x/g) tells us that 

E M = f 1 d * + + i/k - X ) + T^/'(5 " 1) " ^/'(l) " * 

= 5 jf 1 v 7 ^) * + 0Gr 1/2 ) = f + 0Gr 1/2 ) 

because 

<i/°" 1 |/"(x)|<ix = i/'(l)-i/'( ! ,-l). □ 
The error term is thus 0(g 1 ^ 2 ) when h = g 2 + 1; for example, we have 



h 


5 


4>{h,g) 


y/irh/128g 


difference/ y/g 


901 


30 


140.018 


141.076 


0.1933 


1601 


40 


249.539 


250.741 


0.1900 


2501 


50 


390.412 


391.739 


0.1877 



8. Common factors. Now let's consider the behavior of shellsort with increments (ch,cg,l), 
where c is an integer > 1. It is easy to see that the first two passes are equivalent to the first two 
passes of (h,g,l) shellsort on c independent subarrays (X a , X a+C , X a+2c , ■ ■ •) of size \(n — a)/c] 
for < a < c. The inversions that remain are the tp(h,g)n + 0(g 3 h 2 c) inversions within these 
subarrays, plus "cross-inversions" between (2) pairs of subarrays. 

Yao [11, Theorem 2] proved that the average number of cross-inversions is | y/irc (1— c _1 )n 3 / 2 + 
0{cghn). The following lemma improves his error term slightly. 



Z(h,d/g)- 



(h + l)d 



\R\ 



- '^■nodl) />)<ii 
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Lemma 8. The average number of cross-inversions after ch-sorting and cg-sorting is 



1 



- ypFc (1 - c- 1 )™ 3 / 2 + 0(cgh x l 2 n) + 0(c 2 g 3 h 



1/2, 



,2„3t,2> 



Proof. Let's consider first the process of /i-sorting and (/-sorting two independent arrays (Xq,Xi, 
. . . , X n _i) and (Xq,Xi, . . . , X n _i), then interleaving the results to obtain [Xq , Xq , X", X[', . . . 
X'^_ l ,X'^_ l ). The cross inversions are then the pairs {X[' ,X[',} where either X" > X", and I < V 
or X'{ < X'{, and I > V . 

Recasting this process in the model of section 2 above, we assume that Xi = t, while the other 
In — 1 variables (Xq, . . . , . . . , X n _i, X , . . . , X n _i) are independent and uniformly distributed 

between and 1. We define 



Y ki (t) 



[Xk><t], Y kl {t)= Y, [Xk><t] 



(8.2) 



k' = k ( mod h) 
0<k'<n 



k' = k ( mod h) 
0<k'<n 



as in (2.1). The elements of each array are divided into h subarrays by /i-sorting, and the elements 
< t have Yki(t) and Yki{t) elements in the fcth subarrays. Then (7-sorting will form g lists, with 



h-l 



L 3 i{t) = Y 



k=0 



Y k i(t) - a kj 



(8.3) 



elements < t in the jth list of the first array, where akj £ {0, 1, . . . , g — 1} is given by k + a^jh = j 
(mod g). Similarly, there will be 



h-l 



fc=0 



lfcl(*)-Ofcj 



.9 



(8.4) 



elements < t in the jth list of the second. Element Xi = t of the first array will go into list j = J«(t) 
as before, where Jki(t) is defined in (2.2). The number of cross-inversions between this element and 
the elements of the second array will then be 



9-1 



V l (t) = Y\Lj'i(t)-L jl (t)-\j'<j]\. 

j'=0 



(8.5) 



The average total number of cross-inversions is the sum of E V\ (t) over all I, integrated for < t < 1. 

We know from Lemma 3 that the numbers Yki(t) mod g have approximately a uniform distri- 
bution. Therefore 



Y k i(t) - a k j 



9 



Ykijt) — akj + Rjkijt) 



9 



where Rjki(t) is approximately uniform on {0, 1, ... ,g — 1}. It follows that 



Lji(t) 



Ziit) 



h-l 

k=0 



Rjkl(t) — a kj 



(8.6) 
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where 

h-l 

Zt(t) = £ Y kl {t) 

k=0 

is the total number of elements in the first array that are < t. 

Since Rj k i(t) depends on Y kl (t) mod g only, or equivalently on J k i(t), we may use the per- 
turbed truly uniform random variables J k i(t) in section 5 (or repeat the argument there with 
Rjki{t)) and construct random variables Rj k [(t) that are uniform on {0, 1, . . . ,g — 1} and satisfy 
Pr [R* kl (t) / Rjki(t)} < <t){t); moreover, the variables R* k i(t) are independent for < k < h and 
fixed j and I. Consequently 

E \R* kl (t) - R jkl (t)\ < gPr [R* kl (t) + R jkl {t)\ < g<f>(t) . (8.7) 

By independence and the fact that ER* kl (t) = (g — l)/2, 

(h-l \ 2 h-l 

£ R* kl (t) ~ Ha - i)/2 = E E - (9 - i)/2) 2 < V, 

fc=0 / fc=0 

which by the Cauchy-Schwarz inequality yields 

h-l 



E 



£i?* fcJ (t)-%-l)/2 



< Vhg. (8.8) 



fc=0 

Let W jt = i(Eto ^W*) " % " l)/2) and 6, = I(% - l)/2 - E^o <*fci); ^en 

M*) = ^ + Wii + 6i. (8-9) 

where by (8.7) and (8.8) 

E\Wji(t)\ < Vh + hcf>(t). 



A similar argument shows that 



L jl (t) = ^- + W jl + b j . 



Hence 

5-1 



W = E W)-^(*)l +0( |^,| + |^,| + 1) 
j'=0 \ 9 / 

and 

EVj(t) = E \Zi(t) — Zi(t)\ + O(gVh) + 0(gh)(f>(t) . (8.10) 

The quantity — Zi(t)\ is just what we would get if we were counting the cross-inversions 

between two fully sorted arrays that have been interleaved. Therefore 



L 



1 n—l 

E|Z,(*)-Z,(t)|<ft 



o 
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must be the average number of inversions of a random 2-ordered permutation of 2n elements; 
this, according to Douglas H. Hunt in 1967, is exactly n2 2n_2 /( 2 ^ 1 ) [7, exercise 5.2.1-14]. Since 
( 2 ™) = (l + O(l/n))4 n /0m, we obtain the desired total 



L 



1 ^ ,/^ n 3/2 



E Vi{t) dt = V ' + 0{gh x l 2 n) + 0{g 3 h 2 ) (8.11) 



1=0 



by Lemma 6. Similarly, the same result holds for two arrays of different sizes n + 0(1). 
Lemma 8 follows if we replace n by n/c + 0(1) in (8.11) and multiply by (°). □ 

9. The total cost. So far we have been considering only the number of inversions removed 
during the third pass of a three-pass shellsort. But the first two passes can be analyzed as in Yao's 
paper [11]: 

Theorem 2. Let g and h be relatively prime and let c be a positive integer. The average number 
of inversions removed when (ch, eg, l)-shellsort is applied to a random n-element array is 

(9.1) 

on the first pass, 

Y g \[^( h ~ l ) n " 2 + 0(hn) (9.2) 



on the second, and 



i/j(h, g)n + ^^(c- l)n 3 / 2 + 0((c - l)gh l / 2 n) + 0(c 2 g 3 h 2 ) (9.3) 



on the third. 



Proof. The first pass removes an average of ^(re/c/i + 0(l)) inversions from ch subarrays of size 
[n/ch\ or \n/ch~\; this proves (9.1). The second pass is equivalent to the second pass of (h,g,l)- 
shellsort on c independent subarrays of sizes [n/c\ or \n/c]. Equation (9.3) is Lemma 8. So the 
theorem will follow if we can prove (9.2) in the case c = 1. And that case follows from [11, equation 
(32)], with the O(n) term replaced by 0(n/kh) in the notation of that paper. (See also [7, second 
edition, exercise 5.2.1-40.) □ 

Corollary. If h = 0(n 7 / 15 ), g = B(n 1 / 5 ), and gcd(g, h) = 1, the running time of (h,g, l)-shellsort 
isO(n 23 / 15 ). 



Proof. The first pass takes time 0(n 2 " 7 / 15 ), by (9.1); the second takes 0(n 3 / 2+7 / 30 - 1 / 5 ) + 
0(n 1+7 / 15 ), by (9.2); and the third takes 0(n 1+1 / 5+7 / 3 °) + 0( n 3 / 5+14 / 15 ) by (7.6) and (9.3). □ 
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10. Two conjectures. Our estimate 0(g 3 h 2 ) for the difference between ijj(h, g)n and the average 
number of third-pass inversions may not be the best possible. In fact, the authors conjecture 
that the difference is at most 0(g 3 h 3 ^ 2 ). This sharper bound may perhaps follow from methods 
analogous to those in the proof of Lemma 8. 

If such a conjecture is valid, the running time of (h, g, l)-shellsort will be 0(n 3 / 2 ) when h ~ n 1 / 2 
and g ps n 1 / 4 . A computer program was written to test this hypothesis by applying (h, g, l)-shellsort 
to random arrays of n elements with h = g 2 + 1 and n = g 2 h = g 4 + g 2 . The following empirical 
results were obtained, to three significant figures: 



9 


inversions 


/ / 7 \ 

ip{h,g)n 


9 


inversions/ 10 


ip(h,g)n/lQ° 


1 


0±0 





17 


36.6 ± 2.36/32 


37.3 


2 


7.12 ±2.09/100 


7.5 


18 


51.7 ±3.35/32 


52.6 


3 


94.4 ± 13.6/100 


98.3 


19 


71.5 ±4.81/32 


72.9 


4 


563 ±59.1/100 


581 


20 


97.3 ±6.14/10 


99.2 


5 


2210 ± 195/100 


2280 


21 


130 ±8.93/10 


133 


6 


6740 ± 560/100 


6910 


22 


174 ±12.3/10 


176 


7 


17200 ± 1300/100 


17600 


23 


226 ±14.0/10 


230 


8 


38600 ±2820/100 


39500 


24 


291 ± 16.8/10 


297 


9 


78900 ±5670/100 


80600 


25 


368 ±23.7/10 


380 


10 


149000 ± 10600/100 


152000 


26 


475 ±29.1/10 


480 


11 


265000 ± 17200/32 


271000 


27 


595 ±39.0/10 


603 


12 


447000 ± 30300/32 


458000 


28 


735 ±44.9/10 


750 


13 


727000 ± 49300/32 


742000 


29 


922 ±52.1/10 


926 


14 


1140000 ±75400/32 


1160000 


30 


1110 ±74.0/10 


1140 


15 


1730000 ± 116000/32 


1760000 


31 


1370 ±97.9/10 


1380 


16 


2530000 ± 166000/32 


2590000 


32 


1650 ± 101/10 


1670 



(The inversion counts are given here in the form fi ± a / \fr, where \i and a are the empirical mean 
and standard derivation in r independent trials. For example, 10000 trials were made when g < 10, 
but only 100 trials were made when g > 20.) Both mean and standard derivation seem to be 
growing proportionately to g 6 ~ n 3 / 2 , with a ~ ^/15 for g > 10. 

These data suggest also another conjecture, that the average number of inversions is < i()(h,g)n 
when h and g are relatively prime. Indeed, the deviations from uniformity between A and .A* should 
tend to cause fewer inversions, because A forces the balance condition Yfcz(l) = n/h + 0(1) for all 
k and /. This second conjecture obviously implies running time 0(n 3 / 2 ) when h = 6(n 1 / 2 ) and 
g = e(n 1 / 4 ). 

11. More than three increments? It may be possible to extend this analysis to (h, g, f, 1)- 

shellsort, by analyzing the following stochastic algorithm. "Initialize two sets of counters 
. . . , Ig-\) and (Jo, Ji, . . . , Jh-i) by setting Ij <— j mod / and </& = k mod g for all j and k. Then 
execute the following procedure n times: Choose a random k in the range < k < h. Set j <— Jk 
and i <— Ij; then set Jk <— (Jk + h) mod g and Ij <— (Ij ± g) mod /." 
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Consider the transition from I = l t to V = lt+% in the proof of Lemma 1. When elements 
enter the array in increasing order, the choice of k represents the subarray that will contain a new 
element X during the /i-sort; then X goes into list j during the g-sort, and into list i during the 
/-sort. We can therefore obtain the contribution of X to the inversions between lists i and i' for 
i < i' < f, by considering a state Pi obtained from the / table just as Qi was obtained from the 
J table in Lemma 1. 
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